(See Obsidian note “Volatility Targeting”)

The more interesting stuff is way down below…

Volatility targeting

Example: We generate a signal with approx mean 2 and approx volatility 4.
We want to rescale the volatility to 1 and we want a mean of 10

Create vector of returns

set.seed(3746)
ret <- rnorm(1000, 2, 4)

Get mean - should be close to 2

mean(ret)
## [1] 1.976082

Get volatility - should be close to 4

sd(ret)
## [1] 4.073677

Standardise returns

std_ret <- ret / sd(ret)

Get mean of standardised returns

mean(std_ret)
## [1] 0.4850855

Get volatility of standardised returns - should be 1

sd(std_ret)
## [1] 1

So what if we want the same mean but standardised volatility?
First we shift the data to get a mean of 0:

demeaned_ret <- ret - mean(ret)

Check the mean of the “demeaned” returns - should be 0:

mean(demeaned_ret)
## [1] -5.997091e-17

Check the volatility of the “demeaned” returns - should be unchanged (i.e. 4.073677)

sd(demeaned_ret)
## [1] 4.073677

Now standardise the demeaned returns

std_demeaned_ret <- demeaned_ret / sd(demeaned_ret)

Check mean of standardised demeaned returns - should be 0

mean(std_demeaned_ret)
## [1] -1.438181e-17

Check volatility of std_demeaned_ret - should be 1

sd(std_demeaned_ret)
## [1] 1

Now shift the mean of the standardised demeaned returns to our target volatility of 10.

normalized_ret <- std_demeaned_ret + 10

Check mean of normalized returns - should be 10

mean(normalized_ret)
## [1] 10

Check volatility of normalized returns - should be 1

sd(normalized_ret)
## [1] 1

The orange line shows that we end up with only positive signals.
We don’t want the mean of the scaled signal to be 10. We want the mean of the absolute signal to be 10.

Scaling signal by mean of absolute signal

See pysystemtrade implementation:
https://github.com/robcarver17/pysystemtrade/blob/master/sysquant/estimators/forecast_scalar.py#L31

set.seed(3746)
sig <- rnorm(1000, 2, 8)

This is how pysystemtrade does it:
Find mean of abs signal

mean_abs_sig <- mean(abs(sig))
mean_abs_sig
## [1] 6.623671

Standardise signal: Divide signal by mean of absolute signal

std_sig <- sig/mean_abs_sig

Get mean of scaled signal

mean(std_sig)
## [1] 0.2947253

Get sd of scaled signal

sd(std_sig)
## [1] 1.230036

Get mean of positive scaled values

mean(std_sig[std_sig >= 0])
## [1] 1.071792

Get mean of negative scaled values

mean(std_sig[std_sig < 0])
## [1] -0.8904984

Get sd of positive scaled values

sd(std_sig[std_sig >= 0])
## [1] 0.789594

Get sd of negative scaled values

sd(std_sig[std_sig < 0])
## [1] 0.7369654

Scale signal to expected absolute value of 10

scaled_sig <- std_sig * 10

Mean of the scaled signal

mean(scaled_sig)
## [1] 2.947253

Mean of the absolute scaled signal

mean(abs(scaled_sig))
## [1] 10

sd of the scaled signal

sd(scaled_sig)
## [1] 12.30036

sd of the absolute scaled signal

sd(abs(scaled_sig))
## [1] 7.739106

Make plot (echo mean in each signal to be able to make vlines)

mean = 0

Compute the scaling factor for 20 simulated signals with the same input parameters.
Do this for different values of sd.

Means:

## [1] 12.50737
## [1] 125.2921
## [1] 1255.659
## [1] 12540.4
## [1] 125284

Note: When the mean of the raw signal is close to 0, the scaling factor is inverse proportional to the volatility of the raw signal.

Diagnosis

What’s going on? Let’s look at how mean(abs(x)) depends on mean(x).

We observe that the mean of the absolute values is equal to the absolute value of the mean:

set.seed(3654)
sgnl1 <- rnorm(10000, -100, 1)
mean(abs(sgnl1)) == abs(mean(sgnl1))
## [1] TRUE

But not for all raw signals with a mean close to 0:

set.seed(3654)
sgnl2 <- rnorm(10000, 0, 1)
mean(sgnl2)
## [1] -0.006818511
mean(abs(sgnl2)) == abs(mean(sgnl2))
## [1] FALSE
set.seed(3654)
sgnl3 <- rnorm(10000, 0.4, 0.1)
set.seed(3654)
sgnl4 <- rnorm(10000, 0.5, 0.1)
mean(abs(sgnl3)) == abs(mean(sgnl3))
## [1] FALSE
mean(abs(sgnl4)) == abs(mean(sgnl4))
## [1] TRUE

mean(abs(sgnl3)) - abs(mean(sgnl3))

format(
  mean(abs(sgnl3)) - abs(mean(sgnl3)), 
  scientific=F,
  nsmall = 20
)
## [1] "0.00000080417612741135"

mean(abs(sgnl4)) - abs(mean(sgnl4))

format( 
 mean(abs(sgnl4)) - abs(mean(sgnl4)),
 scientific=F,
 nsmall = 20
)
## [1] "0.00000000000000000000"

Absolute value is also not equal to the absolute value of the mean for all values of sd:

set.seed(3654)
sgnl5 <- rnorm(10000, 1, 0.1)
set.seed(3654)
sgnl6 <- rnorm(10000, 1, 0.2)
set.seed(3654)
sgnl7 <- rnorm(10000, 1, 0.3)
set.seed(3654)
sgnl8 <- rnorm(10000, 1, 1)
mean(abs(sgnl5)) == abs(mean(sgnl5))
## [1] TRUE
mean(abs(sgnl6)) == abs(mean(sgnl6))
## [1] TRUE
mean(abs(sgnl7)) == abs(mean(sgnl7))
## [1] FALSE
mean(abs(sgnl8)) == abs(mean(sgnl8))
## [1] FALSE

Analysis

The bigger the sd,

  • the bigger mean(abs(signal)) - abs(mean(signal)).
  • \(\Longrightarrow\) the smaller the scaling factor.
  • the less sensitive the scaling factor is to the mean of the raw signal.

The smaller the sd,

  • the the smaller mean(abs(signal)) - abs(mean(signal)).
  • \(\Longrightarrow\) the bigger the scaling factor.
  • \(\Longrightarrow\) the bigger the scaled signal.
  • the more sensitive the scaling factor is to the mean of the raw signal.

When the mean of the raw signal is not close to zero:

  • mean(abs(signal)) and abs(mean(signal)) are effectively equal for small sd.

When the mean of the raw signal is 0,

  • the scaling factor is inverse proportional to sd.

The bigger the absolute mean of the signal,

  • the less sensitive the scaling factor is to sd.

Notice:

  • Doubling the standard deviation of a mean zero Gaussian will also double it’s average absolute value.
  • The expected absolute value of a Gaussian is equal to sd * sqrt(2/pi).
  • If the mean isn’t zero (which for slow trend following and carry is very likely) then the standard deviation will be biased downwards compared to abs() although that would be easy to fix by using MAD rather than STDEV.
  • These points are from this discussion:

mean = 1

lin-lin

Remove the signal with sd=1

Means

## [1] 8.567769
## [1] 10.00221
## [1] 9.999942
## [1] 10
## [1] 10

This looks much more reasonable!

Diversification Multipliers

Functions

f_sig_div_mult <- function(
    signal_correlations,
    signal_weights,
    min_cor = 0,
    max_sdm = 2.5) {
  H <- signal_correlations
  w <- signal_weights

  clamped_H <- clamp_matrix_lower(H, min_cor)
  clamp_signal_upper(
    1/sqrt(crossprod(t(w %*% clamped_H),  w)),
    max_signal = max_sdm
  )
}

f_signal_cor_mat <- function(
    signals
) {
  cor(signals)
}

f_signal_cov_mat <- function(
    signals
) {
  cov(signals)
}

f_normalization_factor <- function(raw_signal, target = 1) {
  target/mean(abs(raw_signal))
}

f_normalize_signal <- function(signal, normalization_factor = 1) {
  signal * normalization_factor
}

clamp_signal <- function(signal, min_signal = -Inf, max_signal = Inf) {
  #if(length(signal) != 1) {stop("clamp_signal() only takes a single number
  #                              as input.")}
  vapply(
    signal,
    function(x) {max(min_signal, min(x, max_signal))},
    numeric(1)
  )
}

clamp_matrix_lower <- function(input_matrix, min_signal) {
  apply(input_matrix,
        c(1,2),
        clamp_signal_lower,
        min_signal = min_signal
  )
}

clamp_signal_lower <- function(signal, min_signal = -Inf) {
  max(min_signal, signal)
}

clamp_signal_upper <- function(signal, max_signal = Inf) {
  min(signal, max_signal)
}

## This function is modified, so we don't have to load from a system list.
## Only one instrument
combine_signals <- function(
    clamped_signals,
    signal_weights,
    min_cor = 0,
    max_sdm = 2.5,
    sdm_mode = "div_mult",
    cor_or_cov = "cor") {

    ## Calculate signal correlations.
    ## No need to do this every time we run update_position_table_row.
    clamped_signals_mat <- sapply(clamped_signals, function(x) x)
    
    if(cor_or_cov == "cor") {
      signal_cor_mat <- f_signal_cor_mat(clamped_signals_mat)
    } else if (cor_or_cov == "cov") {
      signal_cor_mat <- f_signal_cov_mat(clamped_signals_mat)
    }
      
    combined_signals <- clamped_signals_mat %*% signal_weights
    
    ## Signal Diversification Multiplier
    if(sdm_mode == "div_mult") {
      sdm <- f_sig_div_mult(
        signal_cor_mat, 
        signal_weights
      )
      print(paste("sdm: ", sdm))
    } else if(sdm_mode == "var") {
      n <- length(combined_signals)
      bias_correction <- sqrt((n - 1)/n)
      sdm <- 1/(sd(combined_signals) * bias_correction)
      print(paste("sdm: ", sdm))
    }
    combined_signals <-  combined_signals * sdm
    
  #}
  combined_signals
}

Calculations

Generate 5 signals

set.seed(876234)
signals <- lapply(1:5, function(x) rnorm(200, 0, 1))

Calculate normalization factors

norm_factors <- lapply(signals, function(x) {f_normalization_factor(x, target = 10)})
norm_factors
## [[1]]
## [1] 12.03534
## 
## [[2]]
## [1] 13.07903
## 
## [[3]]
## [1] 12.46417
## 
## [[4]]
## [1] 12.77098
## 
## [[5]]
## [1] 12.65501

Normalize signals
Show mean of absolute normalized signals (target = 10)

normalized_signals <- list()
for(i in seq_along(signals)) {
  normalized_signals[[i]] <- f_normalize_signal(signals[[i]], norm_factors[[i]])
}

lapply(normalized_signals, function(x) mean(abs(x)))
## [[1]]
## [1] 10
## 
## [[2]]
## [1] 10
## 
## [[3]]
## [1] 10
## 
## [[4]]
## [1] 10
## 
## [[5]]
## [1] 10

Clamped signals
Show mean of absolute clamped signals (target = 10)

clamped_signals <- lapply(
  normalized_signals,
  function(x) {clamp_signal(x, min_signal = -20, max_signal = 20)}
)

lapply(clamped_signals, function(x) mean(abs(x)))
## [[1]]
## [1] 9.48431
## 
## [[2]]
## [1] 9.566417
## 
## [[3]]
## [1] 9.448788
## 
## [[4]]
## [1] 9.479629
## 
## [[5]]
## [1] 9.421413

Observe:
Because we clamped the signal, the mean abs no longer hits the target.
This is ok, but it means that the rescaled combined signal will also not meet the target.

Combine signals.
Show mean absolute value of combined signal.

signal_weights <- rep(1/5, 5)
combined_signal <- combine_signals(
  clamped_signals,
  #normalized_signals,
  signal_weights
)
## [1] "sdm:  2.15737878808594"
mean(abs(combined_signal))
## [1] 8.425561

Conclusion

The mean absolute combined signal is a bit below the target, because of the clamping.

Test function

div_test_funk <- function(
    m, 
    s,
    min_signal = -20,
    max_signal = 20,
    min_cor = 0,
    max_sdm = 2.5,
    sdm_mode = "div_mult",
    cor_or_cov = "cor"
    ) {
  signals <- lapply(1:5, function(x) rnorm(200, m, s))
  norm_factors <- lapply(signals, function(x) {f_normalization_factor(x, target = 10)})
  normalized_signals <- list()
  for(i in seq_along(signals)) {
    normalized_signals[[i]] <- f_normalize_signal(signals[[i]], norm_factors[[i]])
  }
  clamped_signals <- lapply(
    normalized_signals,
    function(x) {clamp_signal(x, min_signal = -20, max_signal = 20)}
  )
  signal_weights <- rep(1/5, 5)
  combined_signal <- combine_signals(
    clamped_signals,
    #normalized_signals,
    signal_weights,
    sdm_mode = sdm_mode,
    min_cor = min_cor,
    max_sdm = max_sdm,
    cor_or_cov = cor_or_cov
  )
  list(signals = signals, 
   norm_factors = norm_factors, 
   normalized_signals = normalized_signals, 
   clamped_signals = clamped_signals, 
   signal_weights = signal_weights, 
   combined_signal = combined_signal
  )
}
div_test <- div_test_funk(
  m = 0, 
  s = 1,
  min_signal = -20,
  max_signal = 20,
  min_cor = 0,
  max_sdm = 2.5
)
## [1] "sdm:  2.10490761371908"
mean(abs(div_test$combined_signal))
## [1] 8.555264
div_test <- div_test_funk(
  m = 0, 
  s = 1,
  min_signal = -Inf,
  max_signal = Inf,
  min_cor = -Inf,
  max_sdm = Inf
)
## [1] "sdm:  2.11752460057426"
mean(abs(div_test$combined_signal))
## [1] 8.899273

Observation:
Removing the restrictions doesn’t get us closer to the target.

Let’s try different mean and sd of the raw signal:

div_test <- div_test_funk(
  m = -1, 
  s = 4,
  min_signal = -Inf,
  max_signal = Inf,
  min_cor = -Inf,
  max_sdm = Inf
)
## [1] "sdm:  2.14582085809046"
mean(abs(div_test$combined_signal))
## [1] 9.527018

Observation:
So this phenomenon has to do with the relation between the mean(signal) and mean(abs(signal)).

Compare with 1/sd() instead of 1/(w^THw)

div_test <- div_test_funk(
  m = -1, 
  s = 4,
  min_signal = -Inf,
  max_signal = Inf,
  min_cor = -Inf,
  max_sdm = Inf,
  sdm_mode = "var"
)
## [1] "sdm:  0.205588524468273"
mean(abs(div_test$combined_signal))
## [1] 0.9514696
head(div_test$combined_signal)
##            [,1]
## [1,] -0.8574427
## [2,] -1.3620430
## [3,]  0.7177668
## [4,] -1.7233731
## [5,] -1.4774895
## [6,] -0.7929606
div_test <- div_test_funk(
  m = -1, 
  s = 4,
  min_signal = -Inf,
  max_signal = Inf,
  min_cor = -Inf,
  max_sdm = Inf,
  sdm_mode = "div_mult",
  cor_or_cov = "cor"
)
## [1] "sdm:  2.12341734942083"
mean(abs(div_test$combined_signal))
## [1] 9.161696
div_test <- div_test_funk(
  m = 0.1, 
  s = 0.1,
  min_signal = -Inf,
  max_signal = Inf,
  min_cor = -Inf,
  max_sdm = Inf,
  sdm_mode = "div_mult", #"div_mult" or "var"
  cor_or_cov = "cor" #"cor" or "cov"
)
## [1] "sdm:  2.1017176189473"
mean(abs(div_test$combined_signal))
## [1] 17.42174

Half-normal distribution and Mean Absolute Value

The expected absolute value of a normal random variable with mean \(\mu=0\) and standard deviation \(\sigma\) is \(\sigma \sqrt{\frac{2}{\pi}}\).
See Half-normal distribution:
https://en.wikipedia.org/wiki/Half-normal_distribution

std_norm_signal <- rnorm(1000000, 0, 1)

sd_pop <- function(x) {
  n <- length(x)
  bias_factor <- (n - 1) / n
  sd(x) * bias_factor
}

sqrt(2/pi) * sd_pop(std_norm_signal)
## [1] 0.798431
mean(abs(std_norm_signal))
## [1] 0.7988519

How does the absolute difference between the two methods depend on the number of observations?

SDM

Signal multiplication multiplier

signal_1 <- round(rnorm(1000), 2)
signal_2 <- round(rnorm(1000), 2)
signals <- data.frame(
  x1 = signal_1,
  x2 = signal_2
)
cov(signal_1, signal_2)
## [1] 0.02911757
cov(signals)
##            x1         x2
## x1 0.93970826 0.02911757
## x2 0.02911757 0.98144864
cov(signal_1, signal_1)
## [1] 0.9397083

Convert from correlation matrix to covariance matrix:
Multiply cor matrix on both sides by diagonal matrix with sd’s in diagonal.

D <- diag(c(sd(signal_1), sd(signal_2)))
D %*% cor(signals) %*% D
##            [,1]       [,2]
## [1,] 0.93970826 0.02911757
## [2,] 0.02911757 0.98144864
## compare:
cov(signals)
##            x1         x2
## x1 0.93970826 0.02911757
## x2 0.02911757 0.98144864

Convert from covariance matrix to correlation matrix:
Multiply cov matrix on both sides by diagonal matrix with inverse sd’s in diagonal.
Get sd’s from diagonals in cov matrix.
Note: diag(<vector>) produces a diagonal matrix. diag(<matrix>) produces a vector of the diagonal.

D_inv <- solve(
  sqrt(
    diag(
      diag(
        cov(signals)
      )
    )
  )
) ## Inverse
D_inv %*% cov(signals) %*% D_inv
##            [,1]       [,2]
## [1,] 1.00000000 0.03031969
## [2,] 0.03031969 1.00000000
## compare:
V <- diag(cov(signals)) ## Get variances from cov matrix

cov(signals)/V
##            x1         x2
## x1 1.00000000 0.03098575
## x2 0.02966795 1.00000000
## compare:
cor(signals)
##            x1         x2
## x1 1.00000000 0.03031969
## x2 0.03031969 1.00000000
H <- cor(signals)
V <- cov(signals)
signals_mat <- as.matrix(signals)
W <- c(0.3, 0.7)
comb_signals <- signals_mat %*% W ## X_hat

Check that \(\sigma[\hat{X}] \cdot K = \text{MAV} [\hat{X}]\).
When \(\text{mean}(\hat{X})=0\), \(K = \sqrt{2/\pi}\).
Here, the mean of \(\hat{X}\) is not zero, so we shift \(\hat{X}\) to get a zero mean.
Now the expected difference is zero.
0.01167052 is close to zero. Close enough?

## 1)
K <- sqrt(2/pi)
## Shift to zero mean
comb_sig_demean <- comb_signals - mean(comb_signals)
## The expected difference is zero
(sd(comb_sig_demean) * K) - mean(abs(comb_sig_demean))
## [1] 0.01504096

These should be identical

## 2a)
sqrt(var(comb_signals))[1,1]
## [1] 0.7600743
## 2b)
sd(comb_signals)
## [1] 0.7600743
## 3)
sqrt(
  crossprod(
    t(W %*% V),  
    W
  )
)[1,1]
## [1] 0.7600743
## 4)
Ws <- W * (diag(sqrt(V))) ## weighted st. devs
sqrt(
  crossprod(
    t(Ws %*% H),  
    Ws
  )
)[1,1]
## [1] 0.7600743

Observations:
* As expected, we have a conversion from \(W^THW\) to \(W^TVW\) in 1).
* As expected, \(WCW\) is equal to \(V[X]\), where \(X\) is a vector of the two stochastic variables generating the two signals.

H <- cov(signals)
W <- c(0.5, 0.5)
f_sig_div_mult(H, W)
## [1] 1.421556
1/(sd(as.matrix(signals) %*% W))
## [1] 1.421556
signals_mat <- as.matrix(signals)
W <- c(0.3, 0.7)
comb_signals <- signals_mat %*% W

Calculate combination of first elements in each signal manually and compare with comb_signals.

0.3 * signals[1, 1] + 0.7 * signals[1, 2]
## [1] -0.256
comb_signals[1]
## [1] -0.256

Compare manual calculation based on correlations with sd(comb_signals)

sqrt(crossprod(t(W %*% cov(signals)),  W))
##           [,1]
## [1,] 0.7600743
sqrt(var(comb_signals))
##           [,1]
## [1,] 0.7600743
sd(comb_signals)
## [1] 0.7600743
clamp_matrix_lower <- function(input_matrix, min_signal) {
  apply(input_matrix,
        c(1,2),
        clamp_signal_lower,
        min_signal = min_signal
  )
}
clamp_matrix_lower(H, 0.5)
##           x1        x2
## x1 0.9397083 0.5000000
## x2 0.5000000 0.9814486
clamp_matrix_lower(H, min_signal = 0)
##            x1         x2
## x1 0.93970826 0.02911757
## x2 0.02911757 0.98144864
H
##            x1         x2
## x1 0.93970826 0.02911757
## x2 0.02911757 0.98144864

Comparison 1: Compare standard deviation and MAV

For different distributions of \(X_i \in \{{X_i}\}_{i=1}^p\), compute
\[K = \frac{\text{MAV}[\mathbf{\hat{X}}]}{\sigma_{\hat{X}}}\]
- wrt. \(\mu\)
- wrt \(\sigma\)

where
- \(X_i\): A stochastic variable. Produces a single signal vector.
- \(\mathbf{\tilde{X}}\): A \((p \times 1)\) weighted stochastic vector:
\[\mathbf{\tilde{X}} := \mathbf{w} \circ \mathbf{X} \equiv [w_1 X_1, w_2 X_2, \ldots, w_p X_p]^T\] - \(\hat{X} := \sum_i^p \tilde{X}_i\): Combined, weighted signal. Stochastic variable.

Signal normal distributed

p is number of rules. So p is realistically quite low. Here it’s 3.

p <- 3L
n <- 10000
m <- 0 
sigma <- 1
num_tests <- 100

The red line is \(f(x) = x\).
MAV tends to lie below sd.

How does K depend on the population mean and standard deviation of the individual signals?
Assume that the distribution for all individual signals is the same.

For now it looks like:
- K is close to 1 if (e.g):
- \(\sigma > 0.01\) and \(0 < \text{abs}(m) < 0.001\).
- \(\sigma > 1\) and \(0 < \text{abs}(m) < 0.1\).
- K close to 1 means that MAV and sd are equivalent.

How close is K to 1, when \(\mu = 0\)?

range(grid$K[grid$m == 0])
## [1] 0.7968198 0.8021653

So:
- When \(\mu = 0\), then \(K\) is between \(0.7172\) and \(0.7418\).

This raises the question:
- For which \(\mu\) is \(K = 1\)?

Range of optimal values of \(\mu\):

range(vals)
## [1] 1.031248e-06 3.980538e-02

Inspect the mid range.

Conclusion

  • K is close to 1 if (e.g):
    • \(\sigma > 0.01\) and \(0 < \text{abs}(m) < 0.001\).
    • \(\sigma > 1\) and \(0 < \text{abs}(m) < 0.1\).
  • K close to 1 means that MAV and sd are equivalent.
  • WARNING: K can be huge, if the mean of the individual signals is different from 0, and the standard deviation is close to 0.
  • The optimal \(\mu\) is quite unstable wrt. \(\sigma\) for different draws of \(X\).
    • Especially in the range \(\sigma \in [0.01, 4]\).
      • This means that anywhere near a standard normal distribution, \(K\) is unstable.
  • If \(\sigma\) is large (\(\geq 10\)), then optimal \(\mu\) seems stable, and therefore \(K\) should also be stable wrt. \(\sigma\).
  • For \(\sigma \geq 10\) the optimal \(\mu\) seems to be around \(0.04\).

Signal follows a skewed t-distribution

Generate a skewed t-distributed series and inspect.

m = 0
s = 1
nu = 5
xi = 0.4
n = 1000

hist(x, n = 50)

How does K depend on the population mean and standard deviation of the individual signals?
Assume that the distribution for all individual signals is the same.

p = 3
n = 10000
m = 0
s = 1
nu = 5
xi = 0.4

Conclusion

We see a similar pattern as with normal distributed signals, with the difference that K differs a bit for opposite signed values of \(\mu\).

By eye it looks like K actually tends to be bigger for normal distributed signals than for heavily skewed signals.

Signals follow different t-distributions

How does K depend on the population mean and standard deviation of the individual signals?

The signals are distributed by all permutations of the following parameter values:

p <- 3
n <- 10000
m <- c(-10, -0.1, 0, 0.1, 10)
sigma <- c(10, 1, 0.1, 0.01, 0.001)
nu <- c(3, 30)
xi <- c(0.2, 0.5, 0.8)

Conclusion

  • \(K\) doesn’t look dramatically different when idd for mixed distributions by visual inspection.

Comparison 2

Study to which degree the contributions of \(\mathbf{D}\) and \(K\) cancel each other out.
- \[\frac{\sqrt{\mathbf{w}^T \mathbf{D} \mathbf{H} \mathbf{D} \mathbf{w}} \cdot K}{\sqrt{\mathbf{w}^T \mathbf{H} \mathbf{w}}} = 1 \text{?}\] - Ifthe contributions from \(\mathbf{D}\) and \(K\) cancel each other out, we have:
\[= \left(\sum_i^p \tilde{X}_i \right) \frac{\text{MAV}_{\tau}}{\sqrt{\mathbf{w}^T \mathbf{\Sigma} \mathbf{w}} \cdot K}\] \[= \left(\sum_i^p \tilde{X}_i \right)\frac{\text{MAV}_{\tau}}{\sqrt{\mathbf{w}^T \mathbf{D} \mathbf{H} \mathbf{D} \mathbf{w}} \cdot K}\] \[= \left(\sum_i^p \tilde{X}_i \right)\frac{\text{MAV}_{\tau}}{\sqrt{\mathbf{w}^T \mathbf{H} \mathbf{w}}}\] - …which would mean, that the two methods are indeed identical.

Signal normal distributed

Conclusion

  • The conclusion for normal distributed signals is analogous to the one in comparison 1 above.
  • The contributions of \(\mathbf{D}\) and \(K\) cancel each other out, if (e.g.):
    • \(\sigma > 0.01\) and \(0 < \text{abs}(m) < 0.001\).
    • \(\sigma > 1\) and \(\text{abs}(m) < 0.1\).
  • WARNING: The difference can be huge, if the mean of the individual signals is different from 0, and the standard deviation is close to 0.
  • If we use \(w^THw\) instead of \(w^T\Sigma w\), the diversification multiplier can be many (millions of) orders of magnitude bigger than the theoretical MAV that we are targeting, oi \(\sigma\) is small and \(\mu\) is not \(0\).
  • IMPORTANT: For these reasons, the target mean average signal should be bigger than 1. 10 seems good, or even 100, which may also be more intuitive (200 means double, as in 200 percent).

Signal follows a skewed t-distribution

p = 3
n = 10000
nu = 5
xi = 0.4

Conclusion

  • For the heavily skewed signals, the picture is quite similar to the normal distributed data.
  • For \(\mu = 0\), the fraction is approximately equal to \(\sigma\).
  • The further \(\mu\) is from \(0\), the bigger the fraction.
  • From visual inspection: The fraction is quite stable around \(1\) when \(\mu\) is somewhere between \(0.1\) and \(1\), and \(\sigma < 1\).

Signals follow different t-distributions

The signals are distributed by all permutations of the following parameter values:

p = 3
n = 10000
m <- c(-10, -0.1, 0, 0.1, 10)
sigma <- c(10, 1, 0.1, 0.01, 0.001)
nu <- c(3, 30)
xi <- c(0.2, 0.5, 0.8)

Conclusion

  • frac doesn’t look dramatically different when idd for mixed distributions by visual inspection.

Comparison 3: MAV wrt. m

How does MAV depend on \(\mu\)?

Note: This question is examined in more detail above under “Diagnosis”.

Signal normal distributed

p = 3
n = 10000

Signal follows a skewed t-distribution

p <- 3L
n <- 10000
nu <- 5
xi <- 0.4

Signals follow different t-distributions

The signals are distributed by all permutations of the following parameter values:

p <- 3L
n <- 10000
m <- c(-10, -0.1, 0, 0.1, 10)
sigma <- c(10, 1, 0.1, 0.01, 0.001)
nu <- c(3, 30)
xi <- c(0.2, 0.5, 0.8)

Conclusion

  • mav doesn’t look dramatically different when idd for mixed distributions by visual inspection.

Comparison 4: Contribution of D

Study the contributions of \(\mathbf{D}\).
- \[\frac{\sqrt{\mathbf{w}^T \mathbf{D} \mathbf{H} \mathbf{D} \mathbf{w}}}{\sqrt{\mathbf{w}^T \mathbf{H} \mathbf{w}}}\] - How bad is it to use \(\text{cor}[X]\) instead of \(\text{cov}[X]\)?

Signal normal distributed

Conclusion

  • This shows precisely what we would expect: The contribution of \(\mathbf{D}\) is exactly the standard deviation of the individual signal, when all signals are generated from the same distribution.
  • So multiplying the combined signal by the signal multiplication multiplier,
    \[\frac{1}{\sqrt{\mathbf{w}^T \mathbf{H} \mathbf{w}}}\] is equivalent to multiplying the combined signal by
    \[\frac{\sigma}{\sqrt{\mathbf{w}^T \mathbf{\Sigma} \mathbf{w}}}\]
    • So only when \(K = \sigma^{-1}\) does the SDM make strict sense.

Comparison 5: \(K\) vs \(\sigma^{-1}\)

Signal normal distributed

Conclusion

  • \(K\) is generally not equal to \(\sigma^{-1}\).
  • \(K\) is equal to \(\sigma^{-1}\) for values of \(\mu\) a bit less than \(1\), when \(\sigma\) is less than 1.